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Geometric methods for optimal sensor design 


M.-A. Belabbas* 


Abstract 

An observer is an estimator of the state of a dynamical system from noisy sensor 
measurements. The need for observers is ubiquitous, with applications in fields rang¬ 
ing from engineering to biology to economics. The most widely used observer is the 
Kalman filter, which is known to be the optimal estimator of the state when the noise 
is additive and Gaussian. Because its performance is limited by the sensors to which it 
is paired, it is natural to seek an optimal sensor for the Kalman filter. The problem is 
however not convex and, as a consequence, many ad hoc methods have been used over 
the years to design sensors. We show in this paper how to characterize and obtain the 
optimal sensor for the Kalman filter. Precisely, we exhibit a positive definite operator 
which optimal sensors have to commute with. We furthermore provide a gradient flow 
to find optimal sensors, and prove the convergence of this gradient flow to the unique 
minimum in a broad range of applications. This optimal sensor yields the lowest pos¬ 
sible estimation error for measurements with a fixed signal to noise ratio. The results 
presented here also apply to the dual problem of optimal actuator design. 


1 Introduction 

Since the early work of Kalman, Bucy lfT8l DT9| and Stratonovich |[30l . the estimation of 
linear systems has expanded its range of applications from its engineering roots ll28l to 
fields such as environmental engineering, where for example it is used to estimate sea-level 
change fl4l : financial engineering, where for example it is used to estimate the realized 
volatility error f5| or to price energy futures Il24lk to economics |2|, process control Il26l 
or even biology fl6l. The common thread to these applications is that one cannot observe 
exactly all internal variables of a system, but instead needs to estimate them from partial, 
noisy measurements coming from a set of sensors. 

We address in this paper the optimal design of such sensors. 
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There are well-developed methods in control theory to design estimators of the state of 
a dynamical system based on sensor measurements; such estimators are called observers of 
a system. It stands to reason that as the signal to noise ratio of the measurements increases, 
the estimation error afforded by an observer decreases. The question of interest is thus to 
find which measurements with a given signal to noise ratio are optimal for an observer, 
optimal in the sense that the estimation error is minimized. Because the Kalman filter is 
the minimum mean square estimator of the state ID, optimal measurements for the Kalman 
filter yield the lowest estimation error which one can obtain for a given signal to noise ratio. 
We call the sensor providing such measurements optimal. 

The optimal sensor design problem for Kalman filters is almost as old as the Kalman 
filter itself, and over the years a variety of methods have been proposed, we refer the reader 
to the recent thesis l27l for a survey. The major obstacle encountered is that the optimiza¬ 
tion problem, formulated precisely below, defining an optimal sensor is not convex. To 
sidestep this obstacle, suboptimal solutions obtained by way of convex relaxations or ad- 
hoc heuristics for specific application are often used |[29ll26l . Another approach of choice 
is to focus on a convex performance measure IT3l flOl or optimize bounds for the estimation 
error l23l . There is also a extensive literature discussing the properties of, and numerical 
methods for, optimal sensor/actuator placement in infinite dimensional spaces, see Ifl2jl25l 
and references therein. 

In this paper, we provide an exact characterization of the optimal sensors for Kalman 
filters by exhibiting a positive definite matrix they have to commute with. We furthermore 
provide a gradient algorithm—in fact, a Lax equation I2D — to find such optimal sensors 
and prove its convergence to the global optimum in a broad range of situations. Finally, 
we demonstrate the efficacy of the methods proposed with simulations and provide a rule 
of thumb for choosing sensors that work best in low signal to noise ratio settings. We 
also believe that the geometric analysis provided here sheds light on the intrinsic difficulty 
of the problem, difficulty that arises because the constraints on the number of observation 
signals and their signal to noise ratio are not convex. The optimal sensor design problem is 
equivalent to an optimal actuator placement problem, which we discuss in details below. 

Closely related problems which might benefit from the point of view presented here 
include the optimal scheduling and design of the measurements lfl6l . the joint optimal mea¬ 
surement and control design |[4j or the control of complex systems l23l . 

We now describe the problem and our results precisely. We start with a few conventions 
used throughout the paper. All square matrices are real n x n matrices unless otherwise 
specified. We denote by I n the n x n identity matrix, by the skew-symmetric matrix with 
zero entries everywhere except for the i jth and /7th ones, which are 1 and —1 respectively, 
and by £, ; the symmetric matrix with zero entries everywhere except for the / 'jth and /7th 
ones, which are both one. We simply say norm of a vector to refer to its Frobenius norm. 
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For J a differentiable function on a manifold and X a vector field on the same manifold, 
we let X ■ J = dJ ■ X be the directional derivative of J along X. We denote by Iff 1 the set of 
strictly positive real numbers. 


1.1 Optimal sensor design. 

Consider the linear stochastic differential equation 


dx(t) = Ax(t)dt + Gdw{t) 
dy(t) = cx{t)dt + dv(t) 


( 1 ) 


where w(t) and v(t) are independent Wiener processes and A E W‘ xn .G E M' !Xr ,c E M /,x " 
and d E R. The process x(t) is called the state process and y(t) the observation process. The 
matrix c is the sensing or observation matrix of the system. By an estimator for x is meant 
a dynamical system with input y(t) and whose state, call it x, is an estimate of x. 

The Kalman filter is the optimal, in the mean-squared sense, estimator of the state x(t) 
given past observations y([0,f]). Given the matrices A,G and c as above, the Kalman filter 
in steady state is 


dx(t ) =Ax(t)dt — Kc T (dy(t) — cx(t)dt ) + bu(t)dt 

where the matrix K is the symmetric positive definite solution of the following Riccati equa¬ 
tion: 

KA T +AK-Kc T cK + GG T = 0. (2) 

Not all sensing matrices are equal for the purpose of estimation. In fact, it is not too hard to 
convince oneself that as the norm of c increases, all other things being equal, the estimation 
error will decrease P2l . Keeping these observations in mind, it is natural to seek the best 
sensing matrix of a given norm. To make the statement more precise, denote by E the 
expectation operator. One can show that the gain matrix K is also the steady-state covariance 
of the estimation error |[Q K = E ((x — x)(x — x) 1 ) and thus the trace of K is nothing else 
than the MSE estimation error: 

tr(K)=£E((x ( --x,-) 2 ). 

i 

We are thus led to the following optimal sensor design problem: minimize the trace of K, 
where K obeys Eq. © over c of fixed norm. 
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1.2 Optimal actuator design. 

In view of the well-known duality between observability and controllability, it is not sur¬ 
prising that the optimal actuator design problem takes a formulation similar to the optimal 
sensor design’s. To wit, consider the linear time-invariant system 

x = Ax + bu. (3) 

An optimal linear quadratic controller is a controller which minimizes the cost functional 

poo 

J(x) = J {x{t) T Qx{t) + u 2 (t)) dt, 

given jc(0) = x and for a user-selected positive definite matrix Q. It is known that the opti¬ 
mal controller is a feedback controller of the form u = —b T Kx where K obeys the Riccati 
equation 

A t K + KA- Kbb T K + Q = 0. 

One can show that starting from an initial condition xq, the “cost of return to zero” with 
the above controller is J(x o) = x ( J Kxq. A simple calculation shows that the expected cost 
of return to zero for an initial condition x distributed according to an arbitrary rotationally 
invariant distribution with density g(r)dr, where r = ||x||, is 

poo 

E7 = tr(A') / g(r)dr. 

Jo 

The question thus arises of finding the actuator b that minimizes the trace of K. This actuator 
is the one returning the system to its desired state with the least effort on average. As 
the norm of b increases, the trace of K decreases and we shall thus fix the norm of b. 
By optimizing a broader class of functions below, our results also handle non-rotationally 
invariant distributions on the initial conditions. 

1.3 Main results. 

Having shown that optimal sensor and actuator design can both be cast as minimizing the 
trace of the positive definite solution of the Riccati equation, we pose the following opti¬ 
mization problem, which slightly generalizes the statement introduced above. We adopt the 
point of view of sensor design, and thus look for an optimal sensing matrix c. Without loss 
of generality, we will represent a sensing matrix by where ||c|| = tr(cc T ) = p and y > 0. 
Throughout this paper, we use the notation 

C:=c T c. 
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With a slight abuse of language, which will be justified below, we will refer to both C 
and c as observation matrices. We call an observation matrix c orthonormal if cc T = I p , 
where I p is the p x p identity matrix. We say that C is orthonormal if it can be written as 
C = c T c with c orthonormal. For p = 1, every observation vector is orthonormal and the 
spectral decomposition of C yields c unambiguously. When p> l, C defines an orthonormal 
c € R pxn up to a p dimensional rotation, since for any 0 € M. pxp with 0 T 0 = I p , we have 
C = c T c = (0c) T (0c). 

We define the cost function, for L and Q positive definite matrices, 

J(Y,c):=tr(LK) (4) 

where K satisfies the Riccati equation 

AK + KA t - yKc T cK + Q = 0. 

Note that 7(y, 0c) = /(y,c). We call an observation vector c optimal if it is a global min- 
imizer of /(y,c) for y fixed, and we call it extremal if it is a singular point of /(y,c), but 
not necessarily a minimum. We let [A,B\ = AB — BA be the commutator of two matrices A 
and B. A square matrix is called stable if its eigenvalues have strictly negative real parts. 
For c € W pxn , we denote by spanc the subspace of M" spanned by the rows of c. A sub¬ 
space V of M" is called an invariant subspace of M £ M' ,x " if MV C V. If M is symmetric 
positive definite, all its p-dimensional invariant subspaces are spanned by p eigenvectors 
of M. We refer to the p-dimensional invariant subspace of M spanned by the eigenvectors 
corresponding to the p largest eigenvalues as the highest p-dimensional invariant subspace 
of M. 

The main results of the paper are summarized below: 

1. An observation matrix c € M. pxn is extremal if spanc is an eigenspace of the positive 
definite matrix 

M := KRK (5) 

where K and R are the positive definite solutions of the equations 

A t K + KA- jKCK+Q = 0 
(A - yCK)R + R{A- yCK) J + L = 0. 

2. Generically for L, Q symmetric positive definite, for p = 1, y > 0 small and A stable, 
there is a unique (up to a sign) optimal observation matrix c. 

3. With the same assumptions as in item [3 but for p > 1, there is a unique optimal 
orthonormal observation matrix C = c T c of rank /;; it is such that spanc is the highest 
p-dimensional invariant subspace of M. 
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4. With the same assumptions as in item [3 the differential equation 


C=[C,[C,M]], 

with K.R as above converges from a set of measure one of initial conditions to an 
optimal observation matrix. 


2 Optimal sensor and actuator placement. 

On the Riccati equation. The Riccati equation plays a central in the theory of linear sys¬ 
tems, and much has been written about its properties. We only mention here, and without 
proof, the facts needed to prove our results. A pah - ( A,c ) is called detectable if there exists 
a matrix D such that A — c T D is stable. If the pair (A,c) is detectable and Q is positive 
definite, the Riccati equation A T K + KA — KCK + Q = 0 has a unique positive-definite so¬ 
lution. Moreover, this solution is such that A — CK is a stable matrix Q. In this paper, 
we will restrict our attention to stable matrices A, in which case the pair (A,c) is detectable 
regardless of c. We discuss this assumption in the last section. We gather the facts needed 
in the following result, which is essentially ITTft . 

Lemma 1. Let A be a stable matrix and Q a symmetric positive definite matrix. Let C G R" xn 
be symmetric positive definite with ||C|| = 1 and let y > 0. Then the positive definite solution 
K of the Riccati equation A T K + KA — yKCK + (4 = 0 is analytic with respect to C and y. 

Real projective space and isospectral matrices. Denote by SO(n ) the special orthogonal 
group, that is the set of matrices 0 € R” x ” such that 0 T 0 = /„ and det(0) = 1. We denote 
by so(«) = {D E R nx " | D t = — £4} the Lie algebra of SO(n ) and we use the notation 

ad c A := [C,A] :=CA-AC. 

Let A be a diagonal matrix. We denote by Sym(A) the orbit of the special orthogonal group 
SO(n) acting on A by conjugation; that is 

Sym(A) = |c £ R nx " | C = 0 T A0 for 0 € SO(n )} . 

The set Sym(A) is the set of all real symmetric matrices which can be diagonalized to A. We 
call Sym(A) an isospectral manifold. A simple computation shows that its tangent space 
TcSym(A) at a point C is the following vector space: 

TcSym(A) = {[C,O] = adcD | £2 € so(n)}. (6) 
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We will only consider the case here of A having all entries zero or one. Since we 
clearly have that Sym(A) = Sym(A') if and only if A and A' are conjugate, we can define 
unambiguously Sym(n, p) to be the isospectral manifold with A having p entries one and 
n — p entries zero on the diagonal. Note that if C E Symfn. p), then C 2 = C and C is of rank 
p. Thus Sym (n,p) can be thought of as the space of rank p orthogonal projectors in M". 
The dimension of Sym(/i, p) is easily seen to be 

dimSym(n,p) =np — p 2 . 

In particular, Sym(n, 1) is homeomorphic to the real projective space RP(« — 1). 

We now define the function (see Eq. (|U)) 

/(y,C) : M + x Sym (n,p) i—> M : (y,C) i—» tr(LK) 

where K is the positive definite solution to the Riccati equation of Lemma CD With a slight 
abuse of notation, we will omit the bar over J and write /(y,C) as well. 

The normal metric. The manifold Sym(A) possesses a natural metric called the normal 
metric or Einstein metric. The main idea behind the definition of the normal metric, which 
has already been used in engineering applications ffl5U 8ll9l. is to embed Sym(A) in the Lie 
algebra su(n) and use the so-called Killing form lf20l on su(«). Note that because A is not 
an element of so(n), Sym(A) is not an adjoint orbit 01 of SO(n). Furthermore, we wish to 
include the case of A having repeated entries, which implies that the operator [C, •] (or adc, 
as defined above) acting on so(n) is not invertible. We briefly sketch a construction of the 
normal metric here that emphasizes the properties we shall need below. We refer the reader 
to 01 HI for a more careful construction. 

Denote by imadc the image of adc and by ker adc Cso(n) the kernel of adc- From the 
definition of 7cSym(A), we see that imadc = TcSym(A). The bilinear operator 

K : so(n) x so (n) : (£2i,£U) i->- —tr(QiQ 2 ) 

is symmetric and positive definite. It can thus be used to define the orthogonal complement 
(ker adc) 2- of ker adc i n so(«), which we identify with so(n)/ker adc- Using these facts, 
we can define the invertible map 

ad c : so(n)/ker ad c i—>• imadc • 

The normal metric K n is defined, for A, Y e r c Sym(A), as 

K n (X,Y) := — tr(adc 1 Xadc 1 T). (7) 
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One can show that the normal metric is positive definite and non-degenerate. Moreover, we 
have that 

ad c ad c 1 X=X. ( 8 ) 

Another property we shall need below is the ad-invariance of the trace, which refers to the 
following relation: 

tr((adc^i)^ 2 ) = — tr(£2i adc^). (9) 

We conclude this section by describing an orthonormal basis of 7cSym(«,p). 

Lemma 2. Let 1 < p < n and let .M = {mi,m 3 ,... ,m p } with 1 < mi < m 2 < ■ • • < m p < n, 
all integers. Denote by - W. the complement of.-// in {1,2,... Let E G Sym(«. p) be the 
matrix with zero entries except for the diagonal entries (i. i), i G , which are one, that is 

E = £ la. 

Then the matrices -^= ad^ Lljjfor i G - fL and j G form an orthonormal basis ofT E Sym(n, p). 

Proof Recall that the tangent space at E is spanned by the matrices ad^O for Q. G so (n). 
Note that the 0 (/ -, for i f j span so (n). Hence, to show that the Qjj with i G and j G 
span the tangent space, it is sufficient to show that Llij G ker ad e if and only the conditions 
i G .# and j G ■-// are not satisfied. But a short calculation shows that 

Lij if i = k 

-Zij if j = k (10) 

0 otherwise. 

We conclude from ( flOl ) that ad e fl,/ f 0 only if either i G . G . <# or i G . G . Since 

Cljj = —Lip, the vectors ad/.- fl (/ with i G G .M span 7 / 7 m Sym(«, /;). We now show that 
these vectors are orthonormal for the normal metric. Again, a straightforward calculation 
shows that 

K n (ad E Q. ij ,ad E Q. k i) = -tr (Q. ij Q. k i) = 28 ik 8 j, 

where 8; k = 1 if i = k and 0 otherwise. This proves the claim. □ 

2.1 Gradient flow for optimal sensor design 

We now evaluate the gradient flow of / = tr (LK) with respect to the normal metric. Fix 
C G Sym (n,p) such that (A.C) is detectable and let K be the corresponding positive definite 


[Zfot,£2,y] = < 



solution of the Riccati equation. Recall that the gradient of J evaluated at C, denoted by 
V/(C) obeys the relation full 

K„(VJ(C),X) = dJ-X , for all X € r c Sym(A). (11) 


Let C(t), be a differentiable curve in Sym(A) defined for \t\<£ small and such that C(0) = C 
and f _ Q C(t) = X. We can choose e small enough so that ( A,C(t )) is detectable for all 
\t | <£. From Lemma[TJ we conclude that for all such t, there exists a unique positive definite 
solution K(t) to the algebraic Riccati equation A T K(t) +K(t)A — yK(t)C(t)K(t) + <2 = 0 
and that the curve K(t) is differentiable in t. Then 


dJ-X 


d 

dt 


t =o 


J(C(t))=ti(L- 


K{t)) 
t =o 


Differentiating the Riccati equation, and writing K for ^| ; (} K, we obtain 


( 12 ) 


A 1 K + KA- yKCK - yKXK - yKCK = 0. 

The above equation is a Lyapunov equation Q, which we can write as 

(A - yCK) T K + K(A - yCK) - yKXK = 0 (13) 


and whose solution is 


K = —y 


M-yCK) T t KXKe (A-yCK)t dt ' 


(14) 


Using the definition of K n from Eq. ([7]). we obtain by plugging (fl4l) into (flTT) 

poo 

tr(adc VTadc X) = ytr(L j e [A ~ yCK)Jt KXKe (A ~ yCK)t dt) 

From ©, we can write X = ad ( - D for some Q. € (ker ad f ;) J . Using the cyclic and 
ad-invariance properties of the trace, the last equation can be rewritten as 


h^adc^V/)^) = ytr(ad c 


K 


r 

Jo 


The above equation holds for all Q. G (ker adf) and thus 


e !A - yCK ^ t dtK 
_L 


ft). 


ad c VJ = yadf 


K 


AA—yCK)t Le (A—yCK) T t df K 


+ A 


for some A € ((ker adf ) 1 ) J - = ker adc- Taking adc on both sides of the last relation, we 
obtain V/ = yadf-adc AWXwhere R is the solution of the Lyapunov equation (A — yCK)R + 
R(A — yCK) 1 + L = 0. We summarize these calculations in the following Theorem: 
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Theorem 1. The gradient flow of the function J(y. C) = tr (LK) with respect to the normal 
metric and for y > 0 fixed is 

C = y[C,[C,M ]] 

where M = KRK and K, R obey the equations 

A T K + KA + Q-yKCK = 0 

(15) 

(A - yCK)R + R(A - yCK) T + L = 0 

Moreover, an observation matrix C G Sym (n,p) is extremal if it is an orthogonal projection 
onto a p-dimensional invariant subspace of M. Equivalently, an orthonormal observation 
matrix c G M /,x " is extremal if span c is an eigenspace ofM. 

Proof The first part of the statement was proven above. We thus focus on the second part. 
Recall that extremal points of J are zeros of its gradient, and thus C is extremal if and only 
if [ C,M ] = 0. Because the positive definite solution of the Riccati equation is such that the 
matrix is (A — yCK) stable and because L is symmetric positive definite, we have that R is 
positive definite and thus so is the product KRK =: M. The result is now a consequence 
of the fact that symmetric matrices commute if and only if they have the same invariant 
subspaces. □ 

Remark 1. It is tempting to conjecture that if c\ is an extremal observation vector, and K\ 
and R\ are the corresponding solutions ofEq. (1151 ) above, then any eigenvector of K\R\K\ 
is also extremal. This however is not the case. 

2.2 The extremal points of J 

We have derived in the previous section the gradient of J. Because / is a lower-bounded 
function defined on a compact domain, it is clear that the gradient flow will converge to 
the set of extremal points of J. However, J is not convex and thus we do not have, a 
priori, convergence to the global minimum of J. We show that for 7 small J has a unique 
minimum, a unique maximum and that the other extremal points are finite in number and 
saddle points. This shows that, in that regime, the gradient flow will essentially converge to 
the global minimum. We will discuss in the last section how small 7 needs to be in practice. 

We prove the result in two steps: first, we show that there is a finite number of extremal 
points for 7 small and then we evaluate then - signatures. Recall that the signature of an 
extremal point C of / is a triplet of integers (n + , n , no), where n + (resp. n and no) denotes 
the number of positive (resp. negative, zero) eigenvalues of the Hessian of J at C. The proof 
of the first item goes by studying the parametrized family of vector fields 

F(y,c) = [C,M]. 
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When 7 > 0, F and V/ clearly have the same zeros. We then show that F(0,c) has exactly 
('') zeros and that these zeros persist for 7 > 0 small. We denote by {C,(y)}, i € ^( 7 ), the 
set of zeros of F(y,C), where the index set .X{y) is possibly infinite. 

J has a finite number of extremal points. Let (-, ■) be the normal metric on Sym(/i. p). 
Recall that the Levi-Civita connection is the unique connection that is compatible with the 
metric and torsion free G3; we denote it by V. 

Proposition 1. Let A be a stable matrix. For 7 > 0 small and generic ally for Q,L positive 
definite matrices, the function 

7 ( 7 ,C) : R + x Sym (n,p) 1 —> R : J(y,C) = tr (LK) 

where K satisfies the Riccati equation dES) has exactly extremal points. 

Proof. We introduce the following vector field: 

F : [0,°o) x Sym (n,p) 1 —> TSym(n,p) : (y,C) ' — > [C,M] 

where M = KRK with/? and K obeying Eq. (fl5l) . One should think of F ( 7 , C) as a parametrized 
family of vector fields on Sym(/z,p). We denote by Kq and /?o the solutions of 

A t K + KA + Q = 0 

and 

AR + RA T + L = 0 

respectively. Set Mo := KoRoKq. For 7=0 and generically for Q and L positive definite, we 
conclude from Lemma[ 6 ](see Appendix) that Mo has n distinct eigenvalues. Because sym¬ 
metric matrices commute if and only if they have the same eigenvectors, there are exactly 
(P matrices C € Sym(«,p) which commute with Mo- Thus J^(0) contains (p elements, 
say JL(0) = {1,2,..., (p }. Denote by C,-(0) the corresponding zeros of F. 

Recall that VF is the covariant derivative of F where V is the Levi-Civita connection 
associated to the normal metric. In order to show that for 7 > 0 small enough, J?(y) = 
JL( 0), it is sufficient to show that the linear map VF : TcSym(n,p) \—> TcSym(n,p): X 1 —> 
VxF(0,C,) is non-degenerate at the (p points (0,C,(0)). From Lemma [5] and for Q x such 
that X = C. Qf we have 

V X F = ~ ([Mo, [C,nj] + [n x , [C,M 0 ]]). 
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When C = Cj-(O) for some i, the second term vanishes. We are left with 

V x F(0,Q) = -^[M 0 ,[C,Q v ]]. 

We now show that the covariant derivative is non-degenerate. For this, we need the two 
following facts: first, for any orthogonal matrix 0 £ SO{n), the conjugation map 

Ad© : so (n) i —> so (n) : Cl i—» 0 _1 £20 

has Ad 0 -i for inverse and is consequently surjective onto so(«). Second, for arbitrary ma¬ 
trices 0 £ SO(n ) and A,B £ R" x ", 


Ad©[A,fi] = [Ad© A, Ad© B\. 


(16) 


Using these facts, we conclude that VxF(0,C;) is non-degenerate if and only if the linear 
map 

X > Ad© V x F(0,Q) = [©Mo© 1 , [0C0 1 .fl A ]] 

is non-degenerate. 

Because Mq has exactly n orthonormal eigenvectors, we can let 0 be the orthogonal 
matrix whose columns are eigenvectors of Mq. With this choice of 0, the previous equation 
reduces to 

Ad©Vx-F(0,C,-) = \D. [E,Cl x ]] = ad D ad E Cl x , 


where E is a matrix with zero entries except for p diagonal entries which are equal to 1 
and D is a diagonal matrix with the eigenvalues of Mq on its diagonal. A short calculation 
shows that the commutator ad/)A of a diagonal matrix with diagonal entries d, and a matrix 
A = (ciij) has entry ij equal to (dj — dj ). Since the di are distinct, we deduce that ad/; is 
full rank. Thus Ad© VxFfO.C,) is of full rank. □ 


The signature of the extremal points of J. We now evaluate the signature of the extremal 
points of J. We denote by drJ the Hessian of the function J with respect to the normal 
metric; it is a symmetric, bilinear form on TSyrnf/ij and for the vector fields X and Y, it is 
given by fTTl 

d 2 J(X,Y) =XY -J-V X Y J. (17) 

The choice of connection does not affect the type of extremal points of J of course, but 
it is convenient to fix a connection for the perturbation argument that will be used below. 
Also note that the Flessian can be used to accelerate gradient flows or algebraic equation 
solvers (33]. 
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Proposition 2. Let J = tr (LK) be defined as in Theorem\T\ Let X = adc Ll x and Y = adc Q y 
for Ll x ■ Ll v G co f/i). The Hessian ofj with respect to the normal metric is 

d 2 J(X,Y) = ytr { [C. [M.Llf + \C,VRK + KWK + KRV}Q. y 

1 ( 18 ) 

--[C,M][a x ,Ll y ]} 

where K and R are as in the statement of Theorem \T\and V and W are 

poo 

V = y I e {A - CK ^K[C.Ll x \Ke (A - yCX) ‘dt 


poo 

W = y I e^-^dC^fK-CV-VC-K[C,a x })e^~ 7CK)Tt dt. 

Jo 

We prove Proposidon [2] in the Appendix. The following Corollary makes the analysis 
of d 2 J tractable for y small. 

Corollary 1. Let Kq and Rq be the solutions of 

A t K + KA + Q = 0 (19) 


and 

A T R + RA+L = 0 


respectively. Let 


Mo ■= K 0 R 0 K 0 . 


( 20 ) 

( 21 ) 


For X = ad<- ff A , Y = adf Q. v , the Hessian of J with respect to the normal metric has the 
following expansion around y = 0 : 


d 2 J(XJ) ~ ytr |[C,£2j[M 0 ,ft y ] - ^[C,M 0 ] [^,n,]| +fT(Tl x ,Tl y ) (22) 

where the bilinear form T contains terms of zeroth and higher orders in y. 

Proof From Lemma [Q we know that for y small, the stabilizing solution K of the Riccati 
equation can be expressed as 

K = Kq + h.o.t. in y. 

where h.o.t. are higher order terms in y. Recall that R obeys the equation (A — yCK)R — 
R(A — yCK ) 1 +L = 0. This is a linear equation and thus its solution, when it exists, depends 
analytically on y and C. Flence, similarly as for K, we can write for y small 


R = Rq + h.o.t. in y. 
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We conclude from the above two expansions that we have 

M ~ Mo + h.o.t. in y. 


( 23 ) 


Now recall the explicit expression of dfj at yC derived in Proposition [2] A simple calcula¬ 
tion shows that the first and last terms of the right hand side of (fl 8 l ) admit the expansions 

tr { [yC, Q. x ] [M, Q v ]} = ytr {[C, Q. x ] [M 0 , Oy]} + h.o.t. in 7 

and 

-tr ji[ 7 C,M][ 0 „ 0 y] j = —y^ tr { [C-Mo] [T 2 A -,£ 2 V ]} + h.o.t. in 7 

respectively. The second term however, since both V and W have order one in 7 , contributes 
terms of order at least two in y. We thus have the expansion announced. □ 

We proved in Prop. |T| that J had a finite number of extremal points C,(y) for 7 small. 
The proof went by showing that the extremal points of J(y,C) were the same as the zeros of 
the vector field F(y,C) = [ C,M ]. The latter could however be easily be obtained at y = 0. 
We saw that they were of the form 

C,-( 0) = 0 t £ ; 0 


where E, is a diagonal matrix with p entries equal to 1 , and the other entries zero, and 0 is 
the orthogonal matrix diagonalizing Mq (I2TI) . The following Corollary allows us to evaluate 
the signatures of the extremal points C,-(y): 

Corollary 2. Let Mo be as in Eq. (1211) and 0 1 2)0 be its spectral decomposition. The 
signature of dr J at the extremal point Cfy), for 7 small, and C,-( 0) = 0 1 £0 where E is a 
diagonal matrix with p entries equal to 1 and n — p zero is the same as the signature of the 
bilinear form 

H : T E Sym(n,p) x T E Sym(n,p) ■—> R : (X,Y) > tr{[£,£2 r ][D,£2,]}, (24) 

where X = ad/- Tlx, Y = ad/- Q. y and provided that H is non-degenerate. 

Proof. Let C = Cfy) and Xi,Xi € 7)-Sym()i. p) be such that ad/ = X, for i = 1,2 for 
, 0.2 £ (her ad /;) 1 . Note that the second term in Eq. (l22l) came from the expansion of the 
last term in the Hessian of J (fl 8 T ). For C,(y) an extremal point, this latter term is zero and 
thus does not contribute to the approximation given Eq. (1221 . Hence the dominating term in 
the Hessian of J at C,(y) for 7 > 0 small is the following bilinear form on 7)- ( y,Sym(n, p): 

ytr {[C/ ( 7 ), ^ r ] [ 2 kf 0 , ^ 2 ]} - 
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Because Q(y) depends condnuously on y, and because no eigenvalues of H are zero by 
assumption, a standard argument using continuity of the eigenvalues with respect to 7 shows 
that the signatures of the extremal points C,-(y) for 7 small are the same as the one of C;(0). 
Hence the signature of the above bilinear form is the same as the signature of 

tr{[Q(0),^i][Mo,n 2 ]}. 

We can simplify the problem further as follows: as was done in the first part of the proof , 
let 0 be the orthogonal matrix whose columns contains the eigenvectors of Mq. The cyclic 
invariance of the trace, Eq. ([T6l ) and the fact that Ad© is an isomorphism on so (n) together 
imply that the signature of d 2 J at extremal points is the same as the signature of the bilinear 
form 


H : T E Sym(n,p) x T E Sym(n,p) 1 — >R: (£2i,£ 2 2 ) h —> tr{[E,^!][D,^ 2 ]} (25) 

as was claimed. □ 

It thus remains to evaluate the signature of the bilinear form of Eq. (l24l ). We do this first 
for the case p = 1 . 

The case of scalar observations. We start with the case p = 1, which corresponds to 
having a scalar observation signal. We prove the following Theorem, which covers item [2] 
of the main result. 

Theorem 2. Let A be a stable matrix. For 7 > 0 small and genetically for L, Q positive 
definite matrices, the function 

7(7, C) : R + x Sym(«, 1) 1 —)• R : 7 ( 7 , C) = tr (LK) 

where K satisfies the Riccati equation © has exactly n extremal points with signatures (n — 
1,0,0), (n — 2,1,0),..., (0,/i— 1,0) respectively. Moreover, the extremal point of signature 
(n — p,p— 1 , 0 ) is the orthogonal projection matrix onto the p-th highest invariant subspace 
of M. 

Proof. From Prop.U] we know that for 7 small, the function 7 has exactly n extremal points. 
We now evaluate the signature of the Hessian at these points. Let Ej be the matrix with 
zero entries except for the j jth entry, which is one. From Corollary [2l it suffices to to 
evaluate the signatures of the n bilinear forms obtained by letting E = Ej, for j = 1, .... /?, 
in FI in Eq. ( 1241 ) and if these are non-degenerate, they will give us the signatures sought. 
Assume that the diagonal entries of D are sorted in decreasing order. With this ordering, 
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the signature of /// is the same as the signature of drJ at the extremal point C/(y), where 
Cj( 0) = cjcJ and cj is the eigenvector associated to the /th largest eigenvalue of Mo. Recall 
that from Lemma[2j an orthonormal basis of the tangent space of Sym(n. 1) at E\ is given 
by the commutators or E\ and the n — 1 matrices -if! 12 ,4^0i3 ,..., Note that since 

D is diagonal, \D.Q.\ j] = Q.\ j(d\ — dj) and thus 


H 1 (ai j ,a u ) = (di-dj)S J i 

where Sji = 1 if j = 1 and zero otherwise. This basis hence diagonalizes H\ and shows that 
its eigenvalues are (d 1 — dj) and are all positive. Thus the signature at C\ is (/?.0.0). Now 
for the general case of Hj = tr{ Ej.Q\ \D. O 2 ]}■ An orthonormal basis of the tangent space 
at Ej is given by the commutators of Ej and for j E {1,2,/,... ,n} where j indicates 

that j is ommited from the set. Applying the same approach, we find that the eigenvalues of 
Hj are (dj — dj), for l £ { 1.2./,..../?}. Hence n — j eigenvalues are positive and j — 1 are 
negative. Thus the signature of Ej is (n — j,j — 1,0). This concludes the proof. □ 

The case of vector-valued observations. We now address the case p > 1. Recall that if c\ 
and C 2 are p x n matrices of orthonormal rows that span the same p-dimensional subspace of 
M" then, all other things equal, the estimation error of the corresponding Kalman filters have 
the same statistical properties. Consequently, optimization problems involving the statistical 
properties of the estimation error will, when restricted to orthonormal observation vectors, 
have loci of extremal values and all extremal values in the same locus will yield the same 
estimation performance. 

We now present some combinatorial facts needed to state the main result of this section. 
Let m > 0 be an integer. Recall that a partition of m with p parts is given by p positive 
integers mi,..., m p whose sum is m. The partition is said to have distinct parts or to be a 
distinct partition if the integers m, are pairwise distinct. We denote by P(p,m ) the number 
of partitions of m into p parts and by Q(p,m ) the number of distinct partitions of m into p 
parts. One can show that 

Q(p,m) = P(m — {^\p)- 

See OTl for more properties of P and methods to compute it. 

Let d = np — p 1 denote the dimension of Sym(n. p). We have the following result: 

Theorem 3. With the same assumptions as in Theorem \2} the function 

J : M + x Sym(n,p) 1 —* M : (y,C) 1 —* tr (LK) 
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has ('') equilibria. For any pair (n + ,n_) of positive integers such that n + + «_ = d, there 

are Q(p 1 n + + p ^ +1 ) ) extremal points with index (n + ,n_, 0). In particular, there are unique 
extremal points with signatures (d, 0,0), (d — 1,1,0), (l,d— 1,0) and (O.d.O) respectively 
and no degenerate extremal points. 

The proof of Theorem [3] relies on the following Lemma. 

Lemma 3. Let E be a diagonal matrix with diagonal entries 1 and zero and let X = ad/r Fix, 
Y = ad/- Fly be in 7/;Sym(7i. p). Define the bilinear form 

H e : T E Sym(n,p) x T F Sym(n,p) i— >R:(X,Y )\—» tr{[£,nj [£>,%]} 


where D is a diagonal matrix with pairwise distinct entries in decreasing order along the 
diagonal. Let mj,i = \.... ,p denote the positions of the ones on the diagonal of E, i.e. 
E = \ Lm : .m, and m = LC=i m i- Then the signature of H E is (m — P ^ l \ np — p(p — 

l)/2 —m,0). 


The proof of Lemma [3] is in the appendix. Note that the signature of the Hessian is 
independent of the exact values of the entries of D, provided they are pairwise distinct and 
sorted in decreasing order. We first illustrate Lemma[3]on an example. Set p = 4 and n = 7 
and take E to be the diagonal matrix 


E = 


/I 

0 

0 

0 

0 

0 

\0 


0 0 0 
0 0 0 
0 1 0 
0 0 1 
0 0 0 
0 0 0 
0 0 0 


0 0 0 \ 
0 0 0 
0 0 0 
0 0 0 
0 0 0 
0 1 0 
0 0 0 / 


For this particular E, m\ = l,n?2 = 3,m3 = 4,«?4 = 6 and thus m = 14. Hence the Lemma 
says that the bilinear form H E has a mixed signature (4,8,0). 

The proof of this Theorem is the same as the proof of Theorem 0 save for the evalu¬ 
ation of the signature of the Hessian. We start by summarizing the major steps leading to 
where the two proofs differ. For y > 0 small, there is a one-to-one correspondence between 
extremal points of /(/,■) and zeros [C,Mq] where C = c t c,Mq = KqRqKq and Kq,Rq are 
defined in Eqs. ( fl9l ) and (l20l) . Because both C and Mo are symmetric, there are (”) such 
zeros, corresponding to choices of p eigenvectors of Mq. Finally, we have shown that we 
can assume, without loss of generality, that Mo is a diagonal matrix and that the Hessian at 
the extremal points has the same signature as the bilinear form H E of Eq. ( f24l ). 
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Proof of Theorem^ An extremal point of J can thus be characterized by p distinct, posi¬ 
tive integers mi,..., m p , indicating the position of the p entries on the diagonal of E that are 
equal to 1, the other entries being equal to 0. From Lemma [3] we know that the signature 
of He is (m — ,np — m— p ' p 2 1 ,0) where m = m\+ m2 + ... + m p . From the defini¬ 
tion of Q(p,m), we see that the number of extremal points with n + positive eigenvalues is 
Q(p,n + + ) as announced. 

In particular', the number of extremal points whose Hessian is negative definite (i.e. 
n + = 0) is equal to the number of partitions of p ^ p f 1 ^ by p distinct positive integers. There 
is clearly only one such partition, given by m 1 = 1,..., m p = p. Similarly, mi = 1,... m p -\ = 
p — 1, m p = p + 1 is the only partition of pl ' I \ !) + 1 with distinct positive integers. Hence 
there is also a unique extremal point with n + = 1. One can show in the same fashion that 
there are unique extremal points with n_ = 0 and /; = I. □ 

As a corollary of Theorem [2] and [3j we can show that except for a set of measure zero 
of initial conditions, the differential equation described in item [4| converges to an optimal 
observation matrix. Precisely, we have the following result: 

Corollary 3. Let A be a stable matrix and 1 < p < n. Let J : M + x Symf/i. /;) 1—> M : 
J(y.C) = tr (LK) where L is positive definite and K is the positive definite solution of the 
Riccati equation 

A t K + KA-KCK+Q = 0. 

Let R be the solution of 

(A - CK)R +R{A — CK) t + L = 0. 

Lor y > 0 small and generically for Q. L positive definite the differential equation 

C=[C,[C,M}} 

with M = KRK converges to a global minimum of J (y.c)from a set of measure one of initial 
conditions. 


3 Discussion 

We posed and solved the problem of finding the sensor minimizing the estimation error 
afforded by the Kalman filter. The methodology proposed is applicable to actuator design 
as well. The optimal sensor design problem is a difficult problem in the sense that it is not 
convex. We cast the problem as an optimization problem on an isospectral manifold and 
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equipped this space with a Riemannian metric, called the normal metric. We then evaluated 
the gradient and Hessian of the cost function J to be optimized. We have shown that for 
7 small, where 7 is the norm of the observation vector or sensor, and a stable infinitesimal 
generator A of the dynamics, the gradient flow converges with probability one to the global 
minimum. We have restricted the analysis in this paper to the case of orthonormal sensing 
matrices. Similar results hold for the general case. They are technically more involved and 
we do not elaborate on these here due to space constraints and the fact that most of the main 
ideas already appear in the present treatment of the orthonormal case. 

We now discuss the role of the assumptions made. The first statement of the main result, 
which characterizes optimal observation matrices, holds free of the assumptions that 7 be 
small and the infinitesimal generator A be stable. The second, third and fourth statements, 
however, relied on these assumptions. From a practitioner’s point of view, how small does 
7 need to be? We can answer this question using Eq. (IT 8 l) and the proof of Theorem 0 the 
assumption of 7 small holds for 7 < 7 * where 7 * is the smallest 7 such that the bilinear form 

tr {[C, Hi ] [M, £2 2 ] + [C, VRK + KWK + KRV] Q. 2 } 

with C extremal has a zero eigenvalue. Indeed, for 0 < 7 < 7 *, we then know that the above 
bilinear form has no zero eigenvalues and its signature is the one of the lowest order term. 
Note that 7 * depends on A and Q. We show in Fig. 1 simulation results, which show that 
this assumptions holds for rather large 7 in general. The curves are obtained as follows. We 
first set Q = 1^1 4. We then sampled four batches of 10 4 real 4x4 matrices which are stable 
and whose eigenvalues with largest real parts were, depending on the batch, —0.1, —0.5, 
— 1, or —3 (denoted by Re X m in the legend.). We obtained the samples by drawing matrices 
from a Gaussian ensemble and then translated their eigenvalues by adding a multiple of the 
identity matrix. For each sample matrix, and for 7 ranging from 10 4 to 10 we searched 
for the zeros of the gradient of J numerically and then checked whether the Hessian at that 
zero had a signature given by the dominating term. The curves represent the proportion of 
matrices, out of the 10 4 samples, for which 7 < 7 *. For example, about 80% of the matrices 
with Re A,„ = — Jj were such that 7=4 qualifies as small. Unsurprisingly, as the eigenvalues 
of A are further away from the imaginary axis, 7 * increases and the proportion of matrices 
for which 7 < 7 *, for 7 fixed increases as well. Indeed, for Re A„, = —3, close to 100% of 
matrices are such that 7 = 4 qualifies as small. 

We also assumed that A was stable to reach our conclusions. Note first that Proposition!?! 
which provide the Hessian of J, holds whether A is stable or not. The assumption was needed 
for Femma[T]to hold when 7 = 0 , which in turn allowed us to analyze the Hessian of J via an 
expansion of the product M = KRK around 7 = 0 . When A is not stable, this expansion does 
not hold. Furthermore, it is easy to see that there exist loci of codimension one or two of 


19 
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Figure 1: The assumption y small holds with high probability for large value of 7 . 


observation vectors for which 7(y,c) is unbounded. Loci of unbounded values can evidently 
not be crossed by a gradient flow. If the loci arc all of co-dimension two, then one might 
nevertheless have almost global convergence. Even more, since the domain MP(n — 1) is not 
orientable when n is odd, a locus of codimension one does not necessarily split the domain 
in two disconnected parts. Hence, the analysis of the unstable A case requires a careful 
analysis of the undetectable modes and the homology class of their eigenspaces. A rule of 
thumb for sensor choice.From the proof of Theorem^ we conclude that a good observation 
vector to use is the largest eigenvector of Mo (this matrix is defined in ©), which we denote 
by yco, with ||co|| = 1. Indeed, this vector is optimal for 7 = 0 and one can hope that it 
remains close to optimal as 7 increases. Note that it is also a good starting point of the 
gradient flow. In Fig. 2 we present simulation results that show that this is indeed a sensible 
choice when 7 is small. The curves in Fig. 2 were obtained as follows: for each curve, 
we sampled 10 4 6 x 6 matrices with Re X m as indicated on the legend. We let Q = I^/V 6 . 
Denote by c* be the optimal observer obtained for each sample. Each curve represents the 
average of J(y.co)/.Jiy.c*) as a function of 7 for different values of Re A,„. We see that for 
7 very close to 0, the co and c*’s performances are nearly indistinguishable. As 7 increases, 
the difference becomes more marked as expected. We also plotted the performance of a 
random observer, denoted by c r , which we observe performs predictably worse than both c* 
and cq. 
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Figure 2: Using yco as sensor often yields a close-to-optimal performance. A random choice 
of sensor (top curve, c r ) performs noticeably worse. 
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A Appendix 

A.l The Hessian of J for the normal metric 

We first need an explicit expression for the Levi-Civita connection associated to the normal 
metric. We will derive such an expression for the case of constant vector fields. We recall 
that a vector field X in TSym(A) is called a constant vector field if it is of the form 


x = [c,n x \ 


(26) 


for a constant Q x € so(n). 

Lemma 4. Let (-,■) be the normal metric on Sym(A) and let X = [C,£2J, Y = [C.£2 y ] be 
constant vector fields in rSym(A). Then the Levi-Civita covariant derivative ofY along X 
is 



Proof. Denote by L£ X Y the Lie derivative of Y in the direction X. Recall that the covariant 
derivative V obeys the following relation liTTI 


<VxL,Z) = l - \X ■ (Y,Z) +7 • (Z,X) -Z- (X,Y) + (J? X Y,Z) 

-(^ y Z.X) + (^ z X.Y)} 


(27) 


Let X = [C, . Y = \C. D v ] and Z = [C, D-] be constant vector fields. A standard calculation 

shows that 


j? x Y = [c,[n x ,n y }}. 


Note that 


X ■ (Y.Z) = X ■ tr(D v D,) = 0. 


We thus have 



The first two terms cancel each other and we obtain 


(v x y,z) = itran^a). 
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Since the previous equation holds for all Ll z € so (n) we obtain 


V X Y = adc[£2jc,£iy] 


as announced. □ 

We recall the statement of Proposition [2] 

Proposition 2. Let J = tr(LA') be defined as in Theorem [7] Let X = adc Q. x and Y = adc Ll y 
for Ll x . Lty € so (n). The Hessian of J with respect to the normal metric is 

d 2 J(X,Y ) = ytr {[C,£2j[M,£2y] + \C,VRK + KWK + KRV)Q. y 

1 ( 18 ) 

--[c,M][a x ,Liy]} 

where K and R are as in the statement of Theorem\J\and V and VV' are 

poo 

V = y e ( A - CK i‘ K\C.Ll x ]Ke (A - rCX) 1 dt 

Jo 

and 


r°° t 

W = y e (A ^ rCK)t ([C,a x ]K-CV-VC-K[C,Llf)e^ A ~ YCK) l dt. 

Jo 

Proof Let C S Sym (n,p) and X = ad( f2 A . Y = ad ( ;fl v be constant vector fields. We start 
by evaluating the first term in the definition (fTTT ) of the Hessian. From the definition of the 
gradient and the normal metric, we have that 

Y-F = y{[CM}^ y ). ( 28 ) 

In order to evaluate the differential of the above function along the vector field X, we intro¬ 
duce the curve 

C(t) = e t£ll Ce~ tSlx . 

We have 

X-Y-F = y±- i[C{t),K{t)R{t)K{t)\,£ly). 

at t=0 

We have already given an explicit expression for ^ | _ K(t) in Eqn. (fl4l ). We now derive 
an expression for ^| ? _ 0 /?(?). Recall that R(t) obeys the equation 

(A - yCK)R + R(A- yCK) J + L = 0. 
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Taking the time derivative of both sides, and using again the short-hand ^ | /? = R and 

^ | /L _ 0 C = C = adcSl x , we obtain 

(A - yCK)R + R(A - yCK) T - yR(CK + CK) - y(KC + KC)R = 0. 

Setting S := —R(CK + CK), we can write explicitly 

R = y [ e {A - CK)t {S + S T )e {A ~ CK]Tt . (29) 

Jo 

Gathering the relations above, we have the following expression for X Y J: 

XY-J 

= 7 {([[C, O,], M ], Sly) + ( [C, KRK] ,Sly) + ([C, KRK] ,G y ) + <[C KRK)], Sly)} (30) 

where K and R are given explicitly in (fl4l) and (l29l) respectively. We now focus on the 
second term in Eqn. (fT71) . From Lemma [4] we now that 

V X F = [C, [Sl x ,Sl y ]\. 

Let C(t) be the curve in Sym(A) given by C(t) = Using the expression 

for the gradient of J obtained in Theorem [[] we get 


V x Y-J = y([Sl x ,Sl y \,M). (31) 

Using the ad-invariance property of the normal metric, the first term of (l30l) is equal to 
y([C,£2 v ], [M,Q.y]). Now recalling the expression of d 2 J given in (fT71) . we obtain the result 
using m and (ITTI) . □ 

Lemma 5. Let Abe a stable matrix and y > 0. Define 

F : M + x Sym(/i,p) i—* r c Sym( 7 i,p) : (y,C) i—> [ C,M} 

where K,R satisfy Eq. O and M = KRK. The covariant derivative ofF at (0,C) and with 
respect to its second argument is 

V X F = -1 ([Mo, [C,nj] + [Sl x , [C,M 0 ]]). 

Proof. We need to evaluate VqF( 0,C). Observe that F(0,C) is a constant vector field as 
defined in (l26l) . From Lemma[4l a short calculation yields 

VxF = ^[C,[Sl x ,Mo]]. 
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Using the Jacobi identity, the previous relation can expressed as 

V X F = -1 ([M 0 , [C,4J] + [Ll x , [C,M 0 ]]) 

as announced. □ 


Lemma 3. Let E be a diagonal matrix with diagonal entries 1 and zero and Jet X = ad E Llx, 
Y = ad E^ly be in 7}Sym (n. p). Define the bilinear form 

H e : T E Sym(n,p) x T E Sym(n,p )>—* R : (X,Y) *—> tr{[E,£2*][Z>,£2 v ]} 

where D is a diagonal matrix with pairwise distinct entries in decreasing order along the 
diagonal. Let nv,. i = \.... . p denote the positions of the ones on the diagonal of E, i.e. 
E = and m = Then the signature of H E is (m — piyP P pip — p{p — 

1 )/2 — m, 0). 

Proof of Lemma [3] Recall that the dimension of Sym(n, p) is d := pn — p 2 . We first verify 
that for p distinct integers 1 < m, < n summing to m, m — p( ' p ^ 1 ' 1 g {0,1,... . d). Indeed, on 
the one hand the smallest value that m can take is 1+2 + ...+/? = SLE+1L On the other hand, 
the largest value of m is (n — p + 1) + (n — p + 2) +... + (n — 1) + n. This last expression is 
equal p(n — p) + p(p + l)/2. This proves the claim. 

As before, we let £2^ be the skew-symmetric matrix with zero entries everywhere except 
for the i/th entry, which is 1, and the jith entry, which is — 1 and we let Z, 7 be the symmetric 
matrix with zeros everywhere except for the i jth and /7th entry, which arc one. We have 
shown in Lemma[2]that the tangent space of Symf/i. p) at E is spanned by a basis with vec¬ 
tors [E , Llij] where i € -/M := {«?],... .m p \ and j is in the complement oU# in {1,2,...,/?}, 
which we denoted .M. We claim that this basis diagonalizes the bilinear form H E . To see 
this, first note that 

[D,D.ij] = ( di-dj)Lij. 

Second, an easy calculation show that for i > /' 


[E&ij] 


Ljj if i € ■//and /' £ ./E 
0 otherwise 


Because tr(£,/£*/) = 2 if i = j and k = 1 and zero otherwise we conclude that 


H E (Q.ij,Llki) = 2(<fi ~dj)§ik8ji. 


The bilinear form is non-degenerate and because the di ’s are distinct and sorted in decreasing 
order, i.e. di — dj> 0 if and only if i > j. Thus, the number of positive eigenvalues of H E 
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is equal to the number of integer pairs (/'. /) e .M x .<# with i > j. We can enumerate such 
pairs as follows, : for i = mi, any j G { I.... .mi — 1} is such that the above requirement on 
the pair (i. j) is satisfied. There are mi — 1 such f s. For i = m 2 , the requirement holds for 
any j € {1,... ,mi — l,mi + 1,... .m 2 — 1}. There are m 2 — 2 such y’s. Generally, for i = m/, 
there arc /??] + m 2 + ... + m; — 1 — 2 — / pairs passing the requirement. Flence there is 

a total of m — p ^ 1 ' > positive eigenvalues as announced. □ 

The following Lemma is used to show that the matrix Mo used in the main part of the 
paper generically has distinct eigenvalues, and thus a unique basis of orthonormal eigenvec¬ 
tors. 


Lemma 6. Let A € M" xn be a stable matrix and Q,Lbe positive definite symmetric matrices. 
Let K,R € M" x " be the unique positive definite solutions of 


AK + KA t + Q = 0 
AR + RA T + L = 0 


(32) 


Then generically for Q,L positive definite, the matrix M := KRK has distinct eigenvalues. 

Proof. We first recall that since A is stable, the Lyapunov equations in (1321) have each a 
unique symmetric positive definite solution. We denote them by Jf(Q) and 2z?(L) respec¬ 
tively, i.e. 

Jf(Q)= [ e At Qe A , dt. 

Jo 

We let S + be the set of symmetric positive definite matrices of dimension n and define the 
map F : S + x S + 1 —■> 5 + : (Q,L) 1 —» M where K = and R = (L). The proof of the 

Lemma goes by showing that for a generic point (Q,L) G S ' x S + , F is locally surjective, 
i.e. F maps small enough neighborhoods of (Q. L) onto neighborhoods of F(Q,L). From 
there, the statement of the Lemma follows from a simple contradiction argument. Indeed, 
assume that F is locally surjective but that there exists an open set V C S + x S + for which 
F(V) only contains matrices with non-distinct eigenvalues. The set of matrices in S 1 which 
have non-distinct eigenvalues is of measure zero and thus for any pair (Q,L) in V, F is not 
locally surjective - a contradiction. 

The remainder of the proof is dedicated to showing that F is generically locally surjec¬ 
tive (g.l.s.). To this end, note that an open map is clearly g.l.s. and that the composition of 
generically locally surjective maps is likewise g.l.s. . To see that this last statement holds, 
assume that f\ : M \ —* N and /2 : N 1 —> P arc g.l.s. and let fo = fi° f\- Let C\ C M 
(resp. C 2 C N) be the set of points where f\ is not locally surjective (resp. ff) and let 
Z) = {r€M|/i(r) £ C 2 }. The sets C\ and C 2 are of measure zero by assumption and by the 
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same argument as in the paragraph above, D is of measure zero in M. Thus for x (jiC\ IJ D, 
fj (x) is locally surjective and since Cj UD is of measure zero, fj, is g.l.s. . 

We now return to the main thread. Let G : M' !X " x R' !X ' ! i —> R' 1X " : G(K,R) = M. 
Then we can write F as the composition F = Go (J/f (<2),J/f (L)). The operator ££~ x (X) = 
AX +XA t is nothing more that the Lyapunov operator. One can show that for A stable, the 
Lyapunov operator is of full-rank (observe that its eigenvalues are pairwise sums of eigen¬ 
values of A). Its inverse .if is thus a full rank linear map and consequently an open map. By 
a standard argument, one can show that the map (Q,L) i— > (Jf(Q),Jf(L) is also an open 
map. The map G is a polynomial map and is clearly surjective. If we can show that G is 
g.l.s., then F is the composition of g.l.s. maps and is thus g.l.s. which proves the Lemma. 

It thus remains to show that G is g.l.s. To see this, first recall that at points (K. R) in the 
domain of G where its linearization is full rank, G is locally surjective. Now assume that 
there is an open set V in the domain of G where its linearization is nowhere full rank. Then 
det (^^7 ) = 0 on the open set V and because this determinant is a polynomial function, 
it is zero everywhere. By Sard Theorem (22}, the set W over which the linearization of G 
is not full rank is such that G(W) has measure zero. But we have just shown that W is the 
entire domain of G, which contradicts the fact that G is surjective. This ends the proof of 
the Lemma. □ 
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